Introduction

The International Monetary Fund (IMF or ‘the Fund’) is a global organisation of 190 countries that was set up to promote monetary cooperation, financial stability, international trade, economic reform and poverty reduction. It is funded by member countries, who can draw on the Fund’s resources if they encounter financial difficulties (IMF 2022). The IMF also keeps a permanent and visiting staff focused on economic research. One of the IMF’s flagship research publications is the Working Paper series, for which online records are available back to the early 1990s.

We analyse the Fund’s research with respect to the Working Paper series. We use a topic modelling approach built on the embeddings from the popular BERT transformer architecture. Overall, the topic model produces a set of detailed coherent topics that summarise IMF research over the past 30 years. These topics are robust to several variations of the Working Paper corpus (using paper summaries, titles and subject tags). Many topics relate to issues firmly within the IMF’s remit, such as exchange rates, monetary policy, economic growth and development and reform (e.g. related to labour markets, demography and pensions). However, over the past 10 years contemporary social issues have also gained prominence in IMF research. This is reflected in a surge in research about topics like inequality, climate change and more recently COVID-19, to the extent that these are some of the most popular topics of IMF research over the past 30 years.

Of course our analysis only scratches the surface of the information the topic model has summarised from the Working Papers. To encourage the reader to explore the outputs and reach their own conclusions, we have included interactive charts in this report where possible.

Literature

There is a growing literature that uses text mining techniques to analyse the content of economic and news publications. A detailed review of the literature can be found in Avetisyan (2021).

One strand of literature uses the frequency of certain types of words to construct economic indices. A prominent example is the policy uncertainty index, which matches the occurrence of a pre-defined set of words in news articles (Baker, Bloom, and Davis 2016). Other examples are related to constructing economic sentiment indices through counting the frequency of words appearing in a large dictionary of positive and negative terms (Nguyen, La Cava, and others 2020). The benefit of these approaches is that they are transparent and simple to understand. However, they are also inflexible in the sense that the ‘algorithm’ underlying the analysis is pre-defined and doesn’t actually learn anything from the data.

Another strand of literature uses unsupervised methods, such as latent semantic analysis (LSA) or latent Dirichlet allocation (LDA), to assess the content of economic statements and relate this to macroeconomic data. Much of this analysis has focused on central bank communications, especially for the US Federal Reserve. For example, Hansen, McMahon, and Prat (2018) use LDA topic modeling to assess how transparency reforms have affected the content of FOMC deliberations. Battaglia and Salunina (2020) use a dynamic LDA topic model to construct topic representations that can be used as an input to forecast macroeconomic conditions. Unsupervised methods such as LDA are useful because they ‘learn’ their parameters from otherwise unstructured text data. These methods have two main downsides though. One is that they use a bag of words representation of documents, which ignores the (obviously relevant) ordering of words. The other is that their outputs are highly sensitive to the tuning of hyperparameters (including the number of topics) and also how the text is pre-processed.

Related to the IMF, there is a small literature that employs text mining techniques. For instance, Anderson et al. (2021) use clustering and dimensionality reduction (PCA) on IMF communiques and constituency statements to analyse how the priorities of IMF members have changed over time. In addition, Mihalyi and Mate (2019) analyse IMF Article IV reports (country level reports and recommendations) to assess the nature of IMF country surveillance, while IMF (2019) analyse the IMF’s social spending surveillance. However, to our knowledge, there is no research to date that looks at the content of IMF publications from a topic modelling perspective.

Data

Web scraping and data collection

We build our corpus by web-scraping the official IMF website. The scraper uses the Selenium package in python. Scraping occurs over two steps. First, cycle through each listing page of working papers and collect the URL for each working paper. Then, open the URL for each working paper and append all of the metadata on that page to a dataframe. The result is a data frame containing all the metadata for each of the working papers. The code underlying the scraper is attached and could easily be extended to download the PDFs for each working paper, though we did not have a use for this.

Using the scraper, we collected metadata for 7,246 working papers published between 1990 and 2022. We chose to collect data for the working paper series (as opposed to other publications such as Staff Discussion Notes or Article IV country reports) because the series is available for the longest continuous period of time.

Figure 1: Wordcloud of IMF Working Papers

Figure 1: Wordcloud of IMF Working Papers

Chart 1 shows the number of papers collected for each year of our sample. Earlier in the sample period, there were few papers to scrape but this sharply increased from the mid-1990s and stabilized at around 200-300 papers per annum from the mid 2000s.

Subject to availability, the scraper collected the following data for each paper:

  • Title
  • Author(s)
  • Publication date
  • Summary (abstract)
  • Page length
  • Subject tags

Summary statistics

The collection of paper summaries form the main corpus for our paper. We use the summaries, rather than the actual text of the paper, because it is computationally efficient to process and they should capture the main content and findings of each article. Chart 2 shows that the majority of documents in our corpus have 100-200 words in the summary, which is manageable. We supplement this main corpus with corpi formed from the titles and (manually assigned) subject tags of the papers (see below), which is used to check the robustness of our results.

The subject tags assigned to papers provides another avenue for robustness checking. Chart 3 shows the distribution of subject tags across papers with the majority of papers having less than 10 subject tags. However the assignment of subject tags appears to occur on an adhoc basis, so the tags required a significant amount of cleaning to distill them to a set of usable subjects. Chart 4 shows the most popular subject tags (after cleaning), by their frequency of appearance. The most popular subject tags correspond to topics firmly in the IMF’s remit - central banks, fiscal policy, exchange rates, banks and the financial sector, inflation, the labour market and the financial crisis.

Pre-processing

Text pre-processing is not strictly required for our modeling approach and in some more complex natural language applications (such as machine translation or word prediction) may even be undesirable (Bricken 2021). However, topic modeling is a relatively simple NLP task and pre-processing can help improve the interpretability of our results. Intuitively, topic modeling simply groups documents based on words (or combinations of words) that are common across documents (and unique to that group). Words that are slight variations of each other should play the same role in describing topics and be represented by the same token. Furthermore, punctuation, symbols and stop words are unlikely to contribute to identifying topics in our corpus and simply increase the dimensionality of our data. With this in mind, we apply the following pre-processing steps to the summary and title fields:

  • Token removal: punctuation, symbols, numbers and default stopwords are removed.
  • Lemmatisation: Lemmatisation is appropriate for topic modeling since assigning a document to a topic depends on matching exact strings across documents (Schumacher 2019). For example, without lemmatization ‘exchange rates’ and ‘exchange rate’ are identified as distinct tokens. Clearly, it is optimal to eliminate this duplication.

For the subject tags, we use the same lemmatiser and ngrams to extract popular multi-word subject tags. To help with interpretability, we also manually consolidated some of the similar subject tags into groups (e.g. exchange rates and foreign exchange both map to exchange rates in the subject tags).

Model Methodology

Transformer models

Transformer models are a neural network architecture that have excelled in a range of natural language processing (NLP) tasks. Transformers use the concept of attention to learn short- and long-term dependencies between words in a sequence (Vaswani et al. 2017). The transformer model outputs a vector of embeddings (in euclidean space) for each vocabulary word it is trained on, which is suitable for deployment in many downstream machine learning tasks. The embeddings contain a representation of a word’s frequency and, via the attention mechanism, its positional context.

The power of the transformer architecture is that it can be pre-trained on a large corpus of text, say all of Wikipedia, to produce word embeddings. The user can then ‘fine-tune’ the embeddings using their (comparatively tiny) corpus. This allows them to take advantage of embeddings trained on a corpus much larger than most would have access to for their NLP task.

The transformer we have chosen is the popular pre-trained BERT (Bidirectional Transformer for Language Understanding) architecture (Devlin et al. 2018). BERT augments the transformer architecture from Vaswani et al. (2017) by making the context around a word of interest bi-directional (i.e. a word appearing to the left of the word of interest is treated differently to the same word appearing to the right). This allows the model to predict words that appear on both sides of a word of interest (in the original transformer architecture, the model could only predict words that occur after a word of interest).

In BERT, each word in the vocabulary actually contains multiple embeddings, which capture the different contexts in which a word might appear. This is attractive because it moves away from the ‘bag of words’ approach to text mining (which ignores the context around a word). The downside is that the embeddings from BERT are very high dimensional, which can be a problem for some downstream tasks.

There are many versions of BERT trained on different corpi in many languages. We use the ‘all-mpnet-base-v2’ model (HuggingFace 2022), which is a general use, English language version of BERT suitable for encoding sentences and small paragraphs. This model is trained on data from various sources, including Reddit, Wiki-answers and Stack Exchange. It outputs a set of 768-dimensional embeddings. According to the documentation, the embeddings are best used downstream in tasks such as sentence similarity, information retrieval or clustering. Since we will employ clustering to generate our topics, this is a suitable model for us.

Topic modelling with embeddings

We deploy the fine-tuned embeddings from BERT on our corpus to generate a topic model, following the procedure outlined in Grootendorst (2022). This technique is very new and is inspired by the Top2Vec model (Angelov 2020).

Grootendorst (2022) outlines the steps as (summarised in Figure 3):

Dimension reduction: Our BERT word embeddings have 768 dimensions. Many clustering algorithms handle high dimensional data poorly, since in high dimensions the ‘distance’ between all points tends to converge to the same value (Aggarwal, Hinneburg, and Keim 2001). To address this, Grootendorst uses the UMAP (Uniform Manifold Approximation Projection) algorithm to project the embeddings onto a lower dimensional manifold (McInnes, Healy, and Melville 2018). This algorithm is a good choice because it keeps a lot of the high-dimensional local structure of our data in the lower dimensionality.

Clustering: pass the reduced embeddings to the hierarchical (H)DB-SCAN clustering algorithm. According to Grootendorst, HDB-SCAN works well with the output from UMAP. The algorithm learns the optimal number of clusters and does not force assignment for every observation (observations can be labeled as ‘outliers’). These are downsides of other clustering algorithms that can make their results challenging to interpret consistently. The clusters chosen by HDB-SCAN will form the ‘topics.’

Topic creation: to interpret topics, we need to distinguish between them in a meaningful way. Grootendorst constructs a class-based TF-IDF measure to handle this. This method combines all documents within each cluster into their own ‘class’ (so each cluster is represented by one concatenated ‘class’ document). Then, a TF-IDF representation is applied to the class-level documents. The words with the top TF-IDF score from each class summarise each topic. Using the top scoring TF-IDF words is a clever way to systematically identify topics, because these are the words that are most ‘unique’ in each cluster. Figure 2 summarises the class TF-IDF calculation.

Figure 2: Class-based TF-IDF

Figure 2: Class-based TF-IDF

Topic reduction: Although HDB-SCAN chooses the number of topics automatically, topics may overlap significantly or there may be too many topics for the user to interpret meaningfully. We can identify overlapping topics using an ‘intertopic distance map,’ which projects the embeddings into a 2 dimensional space (using UMAP) and then plots topics according to their cluster centroids (the size of the topics in the map refers to the number of documents assigned to each topic). Based on this, one can decide on whether to reduce topics. One way to reduce topics is to force HDB-SCAN to use a certain amount of topics. Another is to combine topics where the average document in each cluster has a cosine similarity above some threshold (default is 0.9) (Angelov 2020). We reduce the number of topics via the cosine similarity method.

Maximal marginal relevance (MMR): once the topics and their word representations are complete, an optional step is to assess the coherence of the topic descriptions. A common issue is that the top words can be very similar (even after we have lemmatised). This is especially so when using bigrams. For example, in a topic about exchange rates three of the top terms might be exchange rate, exchange and rate! Clearly, this makes some of the top terms redundant for interpretation. One way to correct this problem is to apply maximal marginal relevance. This is a ranking technique which, given a list of terms, attempts to balance between their similarity and diversity (Carbonell and Goldstein 1998).

Dynamic topics: finally, we add a dynamic element to the topic model by specifying a fixed number of periods (time stamps) to split the data over. To calculate the time-specific representations of a topic, term frequencies are calculated for the documents in each topic and time period t. These are then averaged with the global (all time) and period t-1 class TF-IDF representations to help with stability and persistence in the dynamic process. The benefit of this approach is that we get a distinct set of topic words for each topic in each time slice.

Figure 3: BerTopic algorithm

Figure 3: BerTopic algorithm

Results

We pass our pre-processed IMF summaries through the BerTopic algorithm outlined above and set parameters as follows:

For each document, the algorithm outputs a vector of probabilities corresponding to each topic. Topics are assigned to documents according to the highest probability. Table 1 shows the number of documents assigned to the top 10 topics. The highest scoring class TF-IDF words describe each topic. Topic -1 contains all the documents not assigned to a topic. The initial run of the algorithm outputs 65 topics overall, with around one third of the documents assigned the outlier topic.

Each topic in Table 1 is very coherent and relates mostly to international macroeconomics, which comes as no surprise. IMF Working Papers most commonly discuss exchange rates, monetary policy, and fiscal policy and sovereign debt. It is also interesting to see topics described by income inequality and climate change feature in the top 10, despite these being more contemporary issues.

Topic Count Name
-1 2670 -1_financial_policy_paper_bank
0 328 0_exchange_exchange rate_rate_real exchange
1 296 1_inflation_monetary_monetary policy_policy
2 246 2_tax_revenue_vat_income
3 237 3_debt_sovereign_bond_spread
4 235 4_fiscal_fiscal policy_consolidation_rule
5 215 5_wage_labor_unemployment_labor market
6 153 6_inequality_income_poverty_education
7 123 7_shock_cycle_business cycle_business
8 110 8_climate_subsidy_emission_carbon

Table 1: Topic assignments

Figure 4 visualises the topics using an intertopic distance map. The intertopic distance map projects the average embedding for each topic into a two-dimensional space (using UMAP). The size of each topic ‘bubble’ represents the number of documents assigned to the topic. Topics that appear closer together on the map are ‘more similar’ (in the sense that their embeddings have a higher cosine similarity). Since many of the topics overlap, you can click and drag on a cluster of topics to explore a particular area of the map in more detail.

Despite a degree of overlap between the topics, the clusters reveal that most of the 65 topics summarise a particular economic issue coherently. Additionally, the overlapping topics represent related (but also distinct) economic issues.

For example, the cluster of topics in the bottom left of the map are about fiscal policy, sovereign debt, taxes and but then also commodity price shocks, oil and climate change. Each of these topics is distinct, but they are also related and therefore are likely to use similar language. One could summarise that working papers in this cluster of topics commonly relate to governments that rely on resource extraction for export and tax revenues. A common issue these governments (often in emerging markets) face is how to deal with the sustainability of fiscal policy and debt servicing when they are exposed to commodity price shocks. A more contemporary question has been how these countries will adapt to the transition away from fossil fuels that will result from climate change.

Similar narratives appear for the other clusters of related topics. Some examples are aid, remittances, natural disasters and IMF support programs (bottom middle), demography, pensions and savings (bottom middle left) or trade, financial integration, dollarisation, foreign currency and capital flows (also bottom middle left).

Figure 4: Intertopic distances

Even though many of the overlapping topics have distinct interpretations, some are still very similar. Figure 5 shows the distribution of pairwise cosine similarities across different topics (calculated based on their embeddings). We can threshold the cosine similarity to reduce the number of topics. This recursively combines all topics with a pairwise score above the threshold, which we set as 0.9, and occurs iteratively until there are no similarities above the threshold. The recursive procedure explains why, despite observing few similarity scores above 0.9 in Figure 5, applying this technique reduces the number of topics from 65 to 32.

Figure 5: Distribution of cosine similarities

Figure 5: Distribution of cosine similarities

The next step is to choose a number of words to describe each topic. To do this, we can apply the ‘elbow’ method to the class TF-IDF scores (term scores). Figure 7 plots the term scores for the top words in each topic. Here the idea is to choose the number of top words based on the inflection point when the marginal contribution of an additional word becomes constant (the elbow). From Figure 7 it appears that this happens for most topics between 4-6 words, so we decide to describe each topic by its top 5 words.

Figure 7: Term scores

Figure 9 shows the top words for the same set of topics after applying MMR (with a relatively high value for the diversity parameter to ensure a diverse representation of words). MMR has consolidated the top words for all of the topics with repeated top words to allow for a richer interpretation of the topics using the same number of key words. For instance, in our topic on exchange rates we can now see the research relates to exchange rate regimes and (central bank) intervention into the foreign currency market. Or for the topic on labour markets, we can now posit that IMF research often related to reform (which is more specific to the IMF’s remit).

Figure 9: Top words per topic (after MMR)

Finally, we segment each topic into time slices. The dynamic topic model requires us to specify a number of time slices to apply. We choose 10 time slices, corresponding to segments of 3 years. All else equal, longer segments helps to reduce within topic volatility, especially for smaller topics. The trade-off is having fewer documents to represent a topic within a particular time slice. This forms the final output from the model, which we will analyse in the next section. The dynamic topic model over all topics is Figure A2 in the appendix.

Analysis: IMF Working Papers over time

In this section we use the dynamic topic representations to discuss two instances of how IMF Working Papers have attended to various economic issues over time. Given the 65 topics, one could construct many case studies involving groups of topics, we have chosen to discuss only two as an illustration. We observe that IMF Working Papers appear largely reactive to economic events, which makes sense as they are longer-term pieces of research.

Case Study 1: Financial crises

Figure 10 shows selected topics related to exchange rates, fiscal and monetary policy and banking. There is a clear spike in research regarding exchange rates, foreign exchange intervention and sovereign debt from the lates 1990s to early 2000s, around the time of the Asian Financial Crisis. The Asian Financial Crisis started in 1997 when Thailand unpegged its currency from the US Dollar, leading to large capital outflows from emerging Asian economies and significant depreciation in their currencies. In response to the financial crisis, the IMF provided support packages to the Indonesian, Korean and Thai economies.

Separately, we can also see a uptick in research about stress testing and banking crises in the years around the Global Financial Crisis. These coincide with a greater amount of regulatory scrutiny on the banking sector following the GFC. For the topic on monetary policy, the increase in papers around this period could reflect central banks deploying unconventional monetary policies (e.g. quantitative easing) in response to the crisis, many for the first time in the inflation-targeting era. For sovereign debt we see that the peak in research persists for around a decade, which makes sense given the European sovereign debt crisis followed the GFC in 2013.

Figure 10: Financial crises

Case Study 2: Contemporary issues

Figure 11 shows some socially motivated aspects of economic research that have come to the fore over the past decade or so. These were traditionally issues outside of the IMF’s remit and show how the IMF is responsive to contemporary issues in the economic community. As we saw earlier some of these topics are among the largest for our corpus. For instance, we can see a large increase in research on climate change over the past 10 years and, to a lesser extent, natural disasters which are a consequence of a changing climate. In addition, also observe an uptick in research about inequality and females over the past 10 years. Unsurprisingly, there has been a large spike in research about COVID-19 over the past couple of years (the line increases so so sharply because there no papers assigned to this topic from the early 2000s until recently).

Figure 11: Contemporary issues

Conclusion

In this project, we have used a topic modelling approach built on the embeddings produced by the popular NLP transformer architecture BERT. We apply this to a large corpus of IMF Working Papers published over the past 30 years. We demonstrate the ways this topic modelling architecture can be used to find and refine a set of topics from a corpus. Overall, we find a set of coherent detailed topics that describe various aspects of IMF research. Unsurprisingly, much of the IMF’s research relates to topics in macroeconomics that are firmly within the IMF’s remit. However, we also observe a movement toward research on more contemporary social issues in economics, such as inequality and climate change. Our analysis of the output from this topic model really only scratches the surface and we invite the reader to explore the interactive graphs to discover more about topics of IMF research over time.

Aggarwal, Charu C, Alexander Hinneburg, and Daniel A Keim. 2001. “On the Surprising Behavior of Distance Metrics in High Dimensional Space.” In International Conference on Database Theory, 420–34. Springer.
Anderson, Gareth, Paolo Galang, Mr Andrea Gamba, Leandro Medina, and Tianxiao Zheng. 2021. How Have Imf Priorities Evolved? A Text Mining Approach. International Monetary Fund.
Angelov, Dimo. 2020. “Top2vec: Distributed Representations of Topics.” arXiv Preprint arXiv:2008.09470.
Avetisyan, Sergey. 2021. “How the Economy Is Modeled Linguistically and Hence Communicatively?” Available at SSRN 3960782.
Baker, Scott R, Nicholas Bloom, and Steven J Davis. 2016. “Measuring Economic Policy Uncertainty.” The Quarterly Journal of Economics 131 (4): 1593–1636.
Battaglia, Laura, and Maria Salunina. 2020. “Tracking the Economy Using FOMC Speech Transcripts.”
Carbonell, Jaime, and Jade Goldstein. 1998. “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries.” In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 335–36.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv Preprint arXiv:1810.04805.
Grootendorst, Maarten. 2022. “BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure.” arXiv Preprint arXiv:2203.05794.
Hansen, Stephen, Michael McMahon, and Andrea Prat. 2018. “Transparency and Deliberation Within the FOMC: A Computational Linguistics Approach.” The Quarterly Journal of Economics 133 (2): 801–70.
IMF. 2019. “A Strategy for IMF Engagement on Social Spending.”
———. 2022. “About the IMF.”
McInnes, Leland, John Healy, and James Melville. 2018. “Umap: Uniform Manifold Approximation and Projection for Dimension Reduction.” arXiv Preprint arXiv:1802.03426.
Mihalyi, David, and Akos Mate. 2019. “Text-Mining IMF Country Reports–an Original Dataset.” Available at SSRN 3268934.
Nguyen, Kim, Gianni La Cava, and others. 2020. “RDP 2020-08: Start Spreading the News: News Sentiment and Economic Activity in Australia.” Reserve Bank of Australia Research Discussion Papers, no. December.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.